Skip to content

Add seqera:// data-links support to nf-tower filesystem#7070

Draft
jorgee wants to merge 3 commits intomasterfrom
260422-seqera-datalinks-fs
Draft

Add seqera:// data-links support to nf-tower filesystem#7070
jorgee wants to merge 3 commits intomasterfrom
260422-seqera-datalinks-fs

Conversation

@jorgee
Copy link
Copy Markdown
Contributor

@jorgee jorgee commented Apr 24, 2026

Summary

Extends the seqera:// NIO filesystem in nf-tower with a second resource type, data-links. Paths of the form seqera://<org>/<ws>/data-links/<provider>/<name>/<sub-path> resolve to files and directories inside Platform-managed data-links (S3/GCS/Azure buckets or prefixes).

Listings and attribute queries go through the Platform's /data-links/{id}/browse[/path] endpoints; byte reads go through pre-signed URLs returned by /data-links/{id}/generate-download-url and fetched with a plain JDK HttpClient. Only the Seqera access token is required — no AWS/GCP/Azure credentials, no cloud SDK dependency is introduced.

As part of this change, the existing dataset-specific logic in SeqeraFileSystemProvider, SeqeraFileSystem, and SeqeraPath is extracted into a real ResourceTypeHandler abstraction; DatasetsResourceHandler and DataLinksResourceHandler are the two implementations. The generic fs/ classes become resource-type-agnostic for depth ≥ 3 (enforced by ResourceTypeAbstractionTest).

Design artifacts: spec.md, plan.md, ADR.

Highlights

  • Path shape: seqera://<org>/<ws>/data-links/<provider>/<name>/<sub-path>. Provider segments are the lowercase DataLinkProvider.toString() value (aws, google, azure, …).
  • Streaming pagination: Iterator<DataLinkDto> for the data-link list; PagedDataLinkContent (eager first page, lazy subsequent pages) for data-link content browsing. Handler listings expose Iterable<Path> to DirectoryStream — no full materialization.
  • Per-path attribute caching: listings attach SeqeraFileAttributes to each emitted SeqeraPath; a follow-up readAttributes(child) returns cached values with zero API calls.
  • Reliable file-vs-directory detection: readAttributes on a sub-path lists the path's parent directory and finds the entry by name; the entry's type (FILE/FOLDER) is the authoritative signal, and a missing entry → NoSuchFileException.
  • credentialsId forwarding: when DataLinkDto.credentials is non-empty, the first credential's id is forwarded as the credentialsId query parameter on browse and download-URL requests.
  • Memoized lookups: getDataLink(ws, provider, name) uses the server-side &search=<name> filter and memoizes; getDataLinkProviders(ws) memoizes the distinct providers list.
  • Error mapping: 401 → AbortOperationException; 403 → AccessDeniedException; 404 → NoSuchFileException. Consistent with the dataset client.
  • 369 unit tests pass (Spock + Mock(TowerClient)). The pre-existing dataset tests are unchanged and continue to pass.

Requirements / prerequisites

⚠️ Platform permission: the Seqera Platform user whose access token is used to run the pipeline must have a Maintain role (or higher) on the workspace. Lower roles (e.g. View) cannot list/browse data-links through the Platform API and will see AccessDeniedException on any seqera://<org>/<ws>/data-links/... path.

  • nf-tower plugin must be enabled with tower.accessToken / TOWER_ACCESS_TOKEN.
  • The existing dataset filesystem (260310-seqera-dataset-fs) must be in master — this feature builds on it.

Known limitations

  • Signed URL expiration is not handled transparently. Very long reads that outlive the URL's validity window surface as IOException; Nextflow task retry handles recovery.
  • No per-item last-modified exposed by the Platform browse API. SeqeraFileAttributes.lastModifiedTime() returns Instant.EPOCH for data-link entries.
  • Read-only in this iteration. Write operations raise UnsupportedOperationException. The Platform's /data-links/{id}/upload endpoints are a natural future extension point.
  • No data-link write, rename, delete, or management operations (create/update/delete the data-link entity itself).
  • Single Platform endpoint per JVM (unchanged from the dataset feature).

Test plan

  • ./gradlew :plugins:nf-tower:test — all 369 tests pass (verified locally)
  • ./gradlew :plugins:nf-tower:dependencies --configuration runtimeClasspath shows no new cloud-SDK artifacts (no aws-sdk, google-cloud-storage, azure-*)
  • Manual: nextflow fs ls seqera://<org>/<ws>/data-links/ lists providers
  • Manual: nextflow fs ls seqera://<org>/<ws>/data-links/<provider>/ lists data-link names
  • Manual: nextflow fs ls seqera://<org>/<ws>/data-links/<provider>/<name>/ lists top-level bucket entries
  • Manual: nextflow fs stat seqera://<org>/<ws>/data-links/<provider>/<name>/<file> reports is directory: false and the correct size
  • Manual: nextflow fs stat seqera://<org>/<ws>/data-links/<provider>/<name>/<dir> reports is directory: true
  • Manual: nextflow fs stat seqera://<org>/<ws>/data-links/<provider>/<name>/<missing> raises NoSuchFileException
  • Manual: pipeline reads a file inside a data-link via file('seqera://…/data-links/<provider>/<name>/path/to/file') using only TOWER_ACCESS_TOKEN
  • Manual: verify that a Platform user with a View role (below Maintain) receives a clear AccessDeniedException

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 24, 2026

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit 6a4c7f6
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/69eb662eb7e7980008e45546
😎 Deploy Preview https://deploy-preview-7070--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant